Mouse model project

Valeriia Ladyhina

2022-04-10

Introduction

This project…

Installation of required packages

requiredPackages = c('tidyverse','ggplot2','prettydoc','dplyr','RColorBrewer','car','readxl', 'olsrr', 'multcomp', 'ggcorrplot', 'vegan')
for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p, repos = "http://cran.us.r-project.org" )
  library(p,character.only = TRUE)
}

1. Dataset description

Upload needed data

data_mice <- read_xls("../data_folder/Data_Cortex_Nuclear.xls")

Description of the data

The data set consists of the expression levels of 77 proteins/protein modifications that produced detectable signals in the nuclear fraction of cortex.There are 38 control mice and 34 trisomic mice. Control and trisomic groups of mice are further divided on 4 subgroups based on features such as genotype, behavior and treatment. Additional information about this data set can be found here

str(data_mice)
## tibble [1,080 × 82] (S3: tbl_df/tbl/data.frame)
##  $ MouseID        : chr [1:1080] "309_1" "309_2" "309_3" "309_4" ...
##  $ DYRK1A_N       : num [1:1080] 0.504 0.515 0.509 0.442 0.435 ...
##  $ ITSN1_N        : num [1:1080] 0.747 0.689 0.73 0.617 0.617 ...
##  $ BDNF_N         : num [1:1080] 0.43 0.412 0.418 0.359 0.359 ...
##  $ NR1_N          : num [1:1080] 2.82 2.79 2.69 2.47 2.37 ...
##  $ NR2A_N         : num [1:1080] 5.99 5.69 5.62 4.98 4.72 ...
##  $ pAKT_N         : num [1:1080] 0.219 0.212 0.209 0.223 0.213 ...
##  $ pBRAF_N        : num [1:1080] 0.178 0.173 0.176 0.176 0.174 ...
##  $ pCAMKII_N      : num [1:1080] 2.37 2.29 2.28 2.15 2.13 ...
##  $ pCREB_N        : num [1:1080] 0.232 0.227 0.23 0.207 0.192 ...
##  $ pELK_N         : num [1:1080] 1.75 1.6 1.56 1.6 1.5 ...
##  $ pERK_N         : num [1:1080] 0.688 0.695 0.677 0.583 0.551 ...
##  $ pJNK_N         : num [1:1080] 0.306 0.299 0.291 0.297 0.287 ...
##  $ PKCA_N         : num [1:1080] 0.403 0.386 0.381 0.377 0.364 ...
##  $ pMEK_N         : num [1:1080] 0.297 0.281 0.282 0.314 0.278 ...
##  $ pNR1_N         : num [1:1080] 1.022 0.957 1.004 0.875 0.865 ...
##  $ pNR2A_N        : num [1:1080] 0.606 0.588 0.602 0.52 0.508 ...
##  $ pNR2B_N        : num [1:1080] 1.88 1.73 1.73 1.57 1.48 ...
##  $ pPKCAB_N       : num [1:1080] 2.31 2.04 2.02 2.13 2.01 ...
##  $ pRSK_N         : num [1:1080] 0.442 0.445 0.468 0.478 0.483 ...
##  $ AKT_N          : num [1:1080] 0.859 0.835 0.814 0.728 0.688 ...
##  $ BRAF_N         : num [1:1080] 0.416 0.4 0.4 0.386 0.368 ...
##  $ CAMKII_N       : num [1:1080] 0.37 0.356 0.368 0.363 0.355 ...
##  $ CREB_N         : num [1:1080] 0.179 0.174 0.174 0.179 0.175 ...
##  $ ELK_N          : num [1:1080] 1.87 1.76 1.77 1.29 1.32 ...
##  $ ERK_N          : num [1:1080] 3.69 3.49 3.57 2.97 2.9 ...
##  $ GSK3B_N        : num [1:1080] 1.54 1.51 1.5 1.42 1.36 ...
##  $ JNK_N          : num [1:1080] 0.265 0.256 0.26 0.26 0.251 ...
##  $ MEK_N          : num [1:1080] 0.32 0.304 0.312 0.279 0.274 ...
##  $ TRKA_N         : num [1:1080] 0.814 0.781 0.785 0.734 0.703 ...
##  $ RSK_N          : num [1:1080] 0.166 0.157 0.161 0.162 0.155 ...
##  $ APP_N          : num [1:1080] 0.454 0.431 0.423 0.411 0.399 ...
##  $ Bcatenin_N     : num [1:1080] 3.04 2.92 2.94 2.5 2.46 ...
##  $ SOD1_N         : num [1:1080] 0.37 0.342 0.344 0.345 0.329 ...
##  $ MTOR_N         : num [1:1080] 0.459 0.424 0.425 0.429 0.409 ...
##  $ P38_N          : num [1:1080] 0.335 0.325 0.325 0.33 0.313 ...
##  $ pMTOR_N        : num [1:1080] 0.825 0.762 0.757 0.747 0.692 ...
##  $ DSCR1_N        : num [1:1080] 0.577 0.545 0.544 0.547 0.537 ...
##  $ AMPKA_N        : num [1:1080] 0.448 0.421 0.405 0.387 0.361 ...
##  $ NR2B_N         : num [1:1080] 0.586 0.545 0.553 0.548 0.513 ...
##  $ pNUMB_N        : num [1:1080] 0.395 0.368 0.364 0.367 0.352 ...
##  $ RAPTOR_N       : num [1:1080] 0.34 0.322 0.313 0.328 0.312 ...
##  $ TIAM1_N        : num [1:1080] 0.483 0.455 0.447 0.443 0.419 ...
##  $ pP70S6_N       : num [1:1080] 0.294 0.276 0.257 0.399 0.393 ...
##  $ NUMB_N         : num [1:1080] 0.182 0.182 0.184 0.162 0.16 ...
##  $ P70S6_N        : num [1:1080] 0.843 0.848 0.856 0.76 0.768 ...
##  $ pGSK3B_N       : num [1:1080] 0.193 0.195 0.201 0.184 0.186 ...
##  $ pPKCG_N        : num [1:1080] 1.44 1.44 1.52 1.61 1.65 ...
##  $ CDK5_N         : num [1:1080] 0.295 0.294 0.302 0.296 0.297 ...
##  $ S6_N           : num [1:1080] 0.355 0.355 0.386 0.291 0.309 ...
##  $ ADARB1_N       : num [1:1080] 1.34 1.31 1.28 1.2 1.21 ...
##  $ AcetylH3K9_N   : num [1:1080] 0.17 0.171 0.185 0.16 0.165 ...
##  $ RRP1_N         : num [1:1080] 0.159 0.158 0.149 0.166 0.161 ...
##  $ BAX_N          : num [1:1080] 0.189 0.185 0.191 0.185 0.188 ...
##  $ ARC_N          : num [1:1080] 0.106 0.107 0.108 0.103 0.105 ...
##  $ ERBB4_N        : num [1:1080] 0.145 0.15 0.145 0.141 0.142 ...
##  $ nNOS_N         : num [1:1080] 0.177 0.178 0.176 0.164 0.168 ...
##  $ Tau_N          : num [1:1080] 0.125 0.134 0.133 0.123 0.137 ...
##  $ GFAP_N         : num [1:1080] 0.115 0.118 0.118 0.117 0.116 ...
##  $ GluR3_N        : num [1:1080] 0.228 0.238 0.245 0.235 0.256 ...
##  $ GluR4_N        : num [1:1080] 0.143 0.142 0.142 0.145 0.141 ...
##  $ IL1B_N         : num [1:1080] 0.431 0.457 0.51 0.431 0.481 ...
##  $ P3525_N        : num [1:1080] 0.248 0.258 0.255 0.251 0.252 ...
##  $ pCASP9_N       : num [1:1080] 1.6 1.67 1.66 1.48 1.53 ...
##  $ PSD95_N        : num [1:1080] 2.01 2 2.02 1.96 2.01 ...
##  $ SNCA_N         : num [1:1080] 0.108 0.11 0.108 0.12 0.12 ...
##  $ Ubiquitin_N    : num [1:1080] 1.045 1.01 0.997 0.99 0.998 ...
##  $ pGSK3B_Tyr216_N: num [1:1080] 0.832 0.849 0.847 0.833 0.879 ...
##  $ SHH_N          : num [1:1080] 0.189 0.2 0.194 0.192 0.206 ...
##  $ BAD_N          : num [1:1080] 0.123 0.117 0.119 0.133 0.13 ...
##  $ BCL2_N         : num [1:1080] NA NA NA NA NA NA NA NA NA NA ...
##  $ pS6_N          : num [1:1080] 0.106 0.107 0.108 0.103 0.105 ...
##  $ pCFOS_N        : num [1:1080] 0.108 0.104 0.106 0.111 0.111 ...
##  $ SYP_N          : num [1:1080] 0.427 0.442 0.436 0.392 0.434 ...
##  $ H3AcK18_N      : num [1:1080] 0.115 0.112 0.112 0.13 0.118 ...
##  $ EGR1_N         : num [1:1080] 0.132 0.135 0.133 0.147 0.14 ...
##  $ H3MeK4_N       : num [1:1080] 0.128 0.131 0.127 0.147 0.148 ...
##  $ CaNA_N         : num [1:1080] 1.68 1.74 1.93 1.7 1.84 ...
##  $ Genotype       : chr [1:1080] "Control" "Control" "Control" "Control" ...
##  $ Treatment      : chr [1:1080] "Memantine" "Memantine" "Memantine" "Memantine" ...
##  $ Behavior       : chr [1:1080] "C/S" "C/S" "C/S" "C/S" ...
##  $ class          : chr [1:1080] "c-CS-m" "c-CS-m" "c-CS-m" "c-CS-m" ...

The data set has MouseID column describing both ID of mouse and ID of experimental repetition that is against tidy-data requirements. Therefore, I will split this column on MouseID and ExperimentID columns.

data_mice$ExperimentID <- transpose(str_split(data_mice$MouseID, "_"))[[2]]
data_mice$MouseID <- transpose(str_split(data_mice$MouseID, "_"))[[1]]

There are 72 mice in experiment.

There are four variables that describe mice: Genotype, Treatment, Behaviour and Class. I will change the type of these columns to factor.

data_mice$Genotype <- as.factor(data_mice$Genotype)
data_mice$Treatment <- as.factor(data_mice$Treatment)
data_mice$Behavior<- as.factor(data_mice$Behavior)
data_mice$class <- as.factor(data_mice$class)

Column class descibes every mouse based on its genotype, treatment and behavior therefore class column reveals 8 groups of mice.

Mice are distributed among different experimental groups as it is presented in the next table.

## # A tibble: 8 × 2
##   class  number_of_mice_in_the_group
##   <fct>                        <dbl>
## 1 c-CS-m                          10
## 2 c-CS-s                           9
## 3 c-SC-m                          10
## 4 c-SC-s                           9
## 5 t-CS-m                           9
## 6 t-CS-s                           7
## 7 t-SC-m                           9
## 8 t-SC-s                           9

There are 48.9% of experiments that have results for all observations We have too many NA, therefore we need to replace them by mean values of group.

2. In there difference in BDNF_N production among different classes?

To find out if there any difference among experimental groups of mice regarding BDNF_N production I will perform one-way ANOVA analysis.

bdnfn_model <- lm(BDNF_N ~ class, data=data_mice)
bdnfn_model_anova <- Anova(bdnfn_model) 
bdnfn_model_anova
## Anova Table (Type II tests)
## 
## Response: BDNF_N
##            Sum Sq   Df F value    Pr(>F)    
## class     0.28797    7  18.877 < 2.2e-16 ***
## Residuals 2.33619 1072                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on results of ANOVA analysis we can say that class is a significant predictor for BDNF_N protein production (F = 18.8771392)

Checks on conditions of applicability: Cook’s distances.

ols_plot_cooksd_bar(bdnfn_model)

Cook’s distances plot shows that there are some outliers, but threshold is too low. Here I found that the factor can be counted as significantly influential it has to be higher than 1, therefore I make a conclusion that we don’t have critically influencial observations.

Checks on conditions of applicability: Model residues.

bdnfn_model_diag <- fortify(bdnfn_model)
ggplot(bdnfn_model_diag, aes(x = class, y = .stdresid)) + geom_boxplot() + ggtitle("Residues plot")

Plot of residues didn’t reveal any patterns in residue distribution, there are some outliers (values that are lying further than 2 sigma),but the median values of residues is more or less similar among different classes and very close to zero. Overall, the distribution of residues can’t be called normal, but due to big number of observations and applicability of other contitions we can still use ANOVA analysis.

Post-hoc tests. I will perform Tukey post-hoc test.

data_mice_tukey <- glht(bdnfn_model, linfct = mcp(class = 'Tukey'))
data_mice_tukey_sum<-summary(data_mice_tukey)
data_mice_tukey_sum
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = BDNF_N ~ class, data = data_mice)
## 
## Linear Hypotheses:
##                        Estimate Std. Error t value Pr(>|t|)    
## c-CS-s - c-CS-m == 0  0.0030979  0.0055382   0.559  0.99929    
## c-SC-m - c-CS-m == 0 -0.0482717  0.0053905  -8.955  < 0.001 ***
## c-SC-s - c-CS-m == 0 -0.0258249  0.0055382  -4.663  < 0.001 ***
## t-CS-m - c-CS-m == 0 -0.0264852  0.0055382  -4.782  < 0.001 ***
## t-CS-s - c-CS-m == 0 -0.0337570  0.0059400  -5.683  < 0.001 ***
## t-SC-m - c-CS-m == 0 -0.0181541  0.0055382  -3.278  0.02375 *  
## t-SC-s - c-CS-m == 0 -0.0136310  0.0055382  -2.461  0.21290    
## c-SC-m - c-CS-s == 0 -0.0513696  0.0055382  -9.276  < 0.001 ***
## c-SC-s - c-CS-s == 0 -0.0289228  0.0056820  -5.090  < 0.001 ***
## t-CS-m - c-CS-s == 0 -0.0295831  0.0056820  -5.206  < 0.001 ***
## t-CS-s - c-CS-s == 0 -0.0368549  0.0060744  -6.067  < 0.001 ***
## t-SC-m - c-CS-s == 0 -0.0212520  0.0056820  -3.740  0.00474 ** 
## t-SC-s - c-CS-s == 0 -0.0167289  0.0056820  -2.944  0.06481 .  
## c-SC-s - c-SC-m == 0  0.0224468  0.0055382   4.053  0.00134 ** 
## t-CS-m - c-SC-m == 0  0.0217865  0.0055382   3.934  0.00228 ** 
## t-CS-s - c-SC-m == 0  0.0145147  0.0059400   2.444  0.22099    
## t-SC-m - c-SC-m == 0  0.0301176  0.0055382   5.438  < 0.001 ***
## t-SC-s - c-SC-m == 0  0.0346406  0.0055382   6.255  < 0.001 ***
## t-CS-m - c-SC-s == 0 -0.0006603  0.0056820  -0.116  1.00000    
## t-CS-s - c-SC-s == 0 -0.0079321  0.0060744  -1.306  0.89654    
## t-SC-m - c-SC-s == 0  0.0076708  0.0056820   1.350  0.87904    
## t-SC-s - c-SC-s == 0  0.0121939  0.0056820   2.146  0.38516    
## t-CS-s - t-CS-m == 0 -0.0072718  0.0060744  -1.197  0.93264    
## t-SC-m - t-CS-m == 0  0.0083311  0.0056820   1.466  0.82497    
## t-SC-s - t-CS-m == 0  0.0128542  0.0056820   2.262  0.31538    
## t-SC-m - t-CS-s == 0  0.0156029  0.0060744   2.569  0.16769    
## t-SC-s - t-CS-s == 0  0.0201260  0.0060744   3.313  0.02131 *  
## t-SC-s - t-SC-m == 0  0.0045231  0.0056820   0.796  0.99331    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Post-hoc test revealed some of the goups that are significantly differ in production of BDNF_N.

MyData <- data.frame(class = levels(factor(data_mice$class)))
MyData <- data.frame(MyData,
  predict(bdnfn_model, newdata = MyData, interval = "confidence")
)

ggplot(data = MyData, aes(x = class, y = fit)) +
  geom_bar(stat = "identity", aes(fill = class), width = 0.5) +
  geom_errorbar(aes(ymin = lwr, ymax = upr), width = 0.1) +
  ggtitle(label = "Tukey post-hoc test")

3. Linear model prediction of ERBB4_N production based on data regarding other proteins.

data_mice <- data_frame(data_mice)    
data_mice_wo_factors <- data_mice[,-c(1, 79:83)]
corr <- cor(data_mice_wo_factors)
p.mat <- cor_pmat(data_mice_wo_factors)
ggcorrplot(corr,
           hc.order = TRUE,
           type = "full",
           outline.color = "white",
           lab = TRUE,
           lab_size = 8,
           tl.cex = 24,
           p.mat = p.mat)

It seems that there are some positively correlated genes and some negatively correlated genes.

erbb4n_model <- lm(ERBB4_N ~ ., data = data_mice_wo_factors)
erbb4n_model_sum <- summary(erbb4n_model)
erbb4n_model_sum
## 
## Call:
## lm(formula = ERBB4_N ~ ., data = data_mice_wo_factors)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.025516 -0.003834 -0.000075  0.003593  0.038233 
## 
## Coefficients: (1 not defined because of singularities)
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.0274958  0.0057420   4.789 1.93e-06 ***
## DYRK1A_N        -0.0077591  0.0054504  -1.424 0.154877    
## ITSN1_N          0.0121514  0.0059337   2.048 0.040833 *  
## BDNF_N          -0.0018158  0.0142494  -0.127 0.898624    
## NR1_N           -0.0066853  0.0038133  -1.753 0.079882 .  
## NR2A_N           0.0002810  0.0009857   0.285 0.775635    
## pAKT_N           0.0290597  0.0144957   2.005 0.045262 *  
## pBRAF_N         -0.0488960  0.0222090  -2.202 0.027919 *  
## pCAMKII_N       -0.0001033  0.0004580  -0.225 0.821664    
## pCREB_N         -0.0430923  0.0201837  -2.135 0.033002 *  
## pELK_N          -0.0007096  0.0009696  -0.732 0.464427    
## pERK_N          -0.0023965  0.0026422  -0.907 0.364612    
## pJNK_N          -0.0380089  0.0167366  -2.271 0.023358 *  
## PKCA_N           0.0414818  0.0156256   2.655 0.008063 ** 
## pMEK_N          -0.0075635  0.0162765  -0.465 0.642256    
## pNR1_N          -0.0201548  0.0089399  -2.254 0.024380 *  
## pNR2A_N          0.0053618  0.0044320   1.210 0.226642    
## pNR2B_N          0.0033623  0.0038226   0.880 0.379295    
## pPKCAB_N         0.0019429  0.0019169   1.014 0.311055    
## pRSK_N           0.0155977  0.0088832   1.756 0.079416 .  
## AKT_N            0.0118477  0.0060490   1.959 0.050435 .  
## BRAF_N          -0.0104543  0.0067836  -1.541 0.123604    
## CAMKII_N         0.0243916  0.0133265   1.830 0.067501 .  
## CREB_N          -0.0282832  0.0201232  -1.406 0.160182    
## ELK_N            0.0022419  0.0026323   0.852 0.394574    
## ERK_N            0.0041749  0.0016293   2.562 0.010543 *  
## GSK3B_N          0.0025504  0.0040383   0.632 0.527820    
## JNK_N            0.0037357  0.0209840   0.178 0.858738    
## MEK_N            0.0063418  0.0140818   0.450 0.652554    
## TRKA_N           0.0015545  0.0086639   0.179 0.857645    
## RSK_N            0.0373716  0.0250948   1.489 0.136744    
## APP_N           -0.0001430  0.0083580  -0.017 0.986350    
## Bcatenin_N       0.0010572  0.0028745   0.368 0.713102    
## SOD1_N          -0.0034438  0.0020719  -1.662 0.096798 .  
## MTOR_N           0.0379793  0.0119760   3.171 0.001564 ** 
## P38_N            0.0004134  0.0087135   0.047 0.962165    
## pMTOR_N         -0.0141851  0.0057399  -2.471 0.013627 *  
## DSCR1_N          0.0014968  0.0061796   0.242 0.808664    
## AMPKA_N         -0.0219883  0.0153626  -1.431 0.152661    
## NR2B_N           0.0035785  0.0067541   0.530 0.596352    
## pNUMB_N          0.0088084  0.0128920   0.683 0.494610    
## RAPTOR_N         0.0134691  0.0166112   0.811 0.417646    
## TIAM1_N         -0.0177624  0.0121355  -1.464 0.143596    
## pP70S6_N         0.0020408  0.0038899   0.525 0.599957    
## NUMB_N          -0.0428467  0.0237500  -1.804 0.071520 .  
## P70S6_N         -0.0068535  0.0035596  -1.925 0.054470 .  
## pGSK3B_N         0.0680863  0.0259059   2.628 0.008714 ** 
## pPKCG_N         -0.0072380  0.0013190  -5.488 5.16e-08 ***
## CDK5_N           0.0097937  0.0094400   1.037 0.299768    
## S6_N             0.0002482  0.0043802   0.057 0.954832    
## ADARB1_N        -0.0003681  0.0013816  -0.266 0.789971    
## AcetylH3K9_N     0.0045392  0.0041963   1.082 0.279642    
## RRP1_N          -0.0208306  0.0092596  -2.250 0.024690 *  
## BAX_N           -0.0039991  0.0238661  -0.168 0.866960    
## ARC_N            0.1436985  0.0398005   3.610 0.000321 ***
## nNOS_N           0.0138962  0.0189010   0.735 0.462383    
## Tau_N            0.0510379  0.0112052   4.555 5.89e-06 ***
## GFAP_N          -0.0535168  0.0311494  -1.718 0.086092 .  
## GluR3_N         -0.0206878  0.0125084  -1.654 0.098457 .  
## GluR4_N         -0.0218156  0.0111955  -1.949 0.051621 .  
## IL1B_N           0.0202090  0.0073967   2.732 0.006403 ** 
## P3525_N          0.0880711  0.0160245   5.496 4.92e-08 ***
## pCASP9_N         0.0126000  0.0022052   5.714 1.46e-08 ***
## PSD95_N          0.0175607  0.0021005   8.360  < 2e-16 ***
## SNCA_N           0.0199547  0.0240340   0.830 0.406583    
## Ubiquitin_N     -0.0007648  0.0033327  -0.229 0.818540    
## pGSK3B_Tyr216_N  0.0227687  0.0065117   3.497 0.000492 ***
## SHH_N            0.0085335  0.0128293   0.665 0.506103    
## BAD_N           -0.0463235  0.0159592  -2.903 0.003782 ** 
## BCL2_N           0.0122268  0.0149683   0.817 0.414210    
## pS6_N                   NA         NA      NA       NA    
## pCFOS_N         -0.0490420  0.0166173  -2.951 0.003238 ** 
## SYP_N            0.0215151  0.0065234   3.298 0.001007 ** 
## H3AcK18_N        0.0089403  0.0070752   1.264 0.206663    
## EGR1_N           0.0105255  0.0111806   0.941 0.346722    
## H3MeK4_N         0.0203880  0.0094645   2.154 0.031465 *  
## CaNA_N          -0.0065095  0.0024444  -2.663 0.007869 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.006661 on 1004 degrees of freedom
## Multiple R-squared:  0.8182, Adjusted R-squared:  0.8046 
## F-statistic: 60.25 on 75 and 1004 DF,  p-value: < 2.2e-16
ols_plot_cooksd_bar(erbb4n_model)

No influencial observations

erbb4n_model_diag <- fortify(erbb4n_model)
ggplot(erbb4n_model_diag, aes(x = .fitted, y = .stdresid)) + 
  geom_point() + 
  ggtitle("Residues plot") +
  geom_smooth(method = "lm") +
  geom_hline(yintercept = 2, color = "red") +
  geom_hline(yintercept = -2, color = "red")
## `geom_smooth()` using formula 'y ~ x'

4. PCrincipal component analysis

5.

  1. Попробовать построить линейную модель, способную предсказать уровень продукции белка ERBB4_N на основании данных о других белках в эксперименте (15 баллов) – провести диагностику полученной линейной модели – объяснить, почему это является хорошим/не хорошим решением
  2. Сделайте PCA (15 баллов) – ординацию – постройте графики факторных нагрузок – определите, какой процент объясняет каждая компонента – постройте трехмерный график для первых 3-х компонент
  3. Поиск дифференциальных белков – творческая часть задания (15 баллов) – можно сделать реанализ из статьи, но предупреждаю сразу, что там машинное обучение – один из вариантов решения - использование методов направленной ординации – можно использовать limma/DeSeq2 (limma проще с осознания логики, также в limma и DeSeq2 немного по разному работают статистические тесты) Дополнительные баллы по накопительной системе за каждую адекватную идею и её реализацию (до 15 баллов).